Learning Count Classifier Preferences of Malay Nouns

نویسندگان

  • Jeremy Nicholson
  • Timothy Baldwin
چکیده

We develop a data set of Malay lexemes labelled with count classifiers, that are attested in raw or lemmatised corpora. A maximum entropy classifier based on simple, languageinspecific features generated from context tokens achieves about 50% F-score, or about 65% precision when a suite of binary classifiers is built to aid multi-class prediction of headword nouns. Surprisingly, numeric features are not observed to aid classification. This system represents a useful step for semisupervised lexicography across a range of languages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Web and Corpus Methods for Malay Count Classifier Prediction

We examine the capacity of Web and corpus frequency methods to predict preferred count classifiers for nouns in Malay. The observed F-score for the Web model of 0.671 considerably outperformed corpus-based frequency and machine learning models. We expect that this is a fruitful extension for Web–as–corpus approaches to lexicons in languages other than English, but further research is required i...

متن کامل

Multilingual Generation of Numeral Classifiers using a Common Ontology

In this paper, we present a solution to the problem of generating both Japanese and Korean numeral classifiers using semantic classes from an ontology. Most nouns must use a numeral classifier when they are quantified in languages such as Chinese, Japanese, Korean, Malay and Thai. In order to select an appropriate classifier, we propose an algorithm which associates classifiers with semantic cl...

متن کامل

Individuation and Quantification: Do bare nouns in Mandarin Chinese individuate?

Some have proposed that speakers of classifier languages such as Mandarin or Japanese, which lack count-mass syntax, have to rely on classifiers for acquiring individuated meanings of nouns (e.g., Borer 2005; Lucy 1992). This paper examines this view by looking at how Mandarin adults interpret bare nouns and use classifier knowledge to guide quantification in three experiments. Experiment 1 fou...

متن کامل

The Count-Mass Distinction of Abstract Nouns in Mandarin Chinese

The issue of whether nouns in Mandarin Chinese can be distinguished into count and mass nouns has been debated in recent literature. Unlike English, Mandarin Chinese is a language where nouns are not obviously count nouns or mass nouns. In fact, syntactically nouns in Mandarin are similar to mass nouns in English, as they cannot combine directly with numerals, but must combine with classifiers;...

متن کامل

Learning Subjective Nouns using Extraction Pattern Bootstrapping 2003 Conference on Natural Language Learning (CoNLL-03), ACL SIGNLL

We explore the idea of creating a subjectivity classifier that uses lists of subjective nouns learned by bootstrapping algorithms. The goal of our research is to develop a system that can distinguish subjective sentences from objective sentences. First, we use two bootstrapping algorithms that exploit extraction patterns to learn sets of subjective nouns. Then we train a Naive Bayes classifier ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008